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Abstract--- Content-Based Image Retrieval (CBIR) uses 
the visual contents of an image such as color, shape, 
texture and spatial layout to represent and index the 
image. The CBIR is the process of retrieving images from 
a database or library of digital images according to the 
visual content of the images. Image retrieval is the most 
essential process in the real world web application where 
the most of the user attempting to retrieve the images by 
submitting the label keywords. The image retrieval 
process is enhanced to improve the retrieval accuracy by 
retrieving the contents based on visual information 
present in the images instead of the labelling information. 
Then feature extraction on image retrieval is to be 
accomplished. The segmentation is the process of 
partitioning an image into multiple images. Content based 
image retrieval is done efficiently by using the 
combination of the texture and the shape features. 
Gustafson-kessel algorithm is used for segmentation to 
improve the retrieval accuracy of the images. The texture 
features are extracted from the segmented images to 
calculate the Hausdroff distance for similarity measures. 
Based on the similarity value, the images in the data 
bases are retrieved and the performance is evaluated with 
Corel database of images. The high accuracy, precision 
and recall are compared with the existing models and are 
implemented with Mat lab. 
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I . INTRODUCTION 

An image retrieval system is a computer system for 
browsing, searching and retrieving images from a large 
database of digital images. Most traditional and common 
methods of image retrieval utilize some method of adding 
metadata such as captioning', keywords or descriptions to the 
images so that retrieval can be performed over the annotation 
words. Manual image annotation is time-consuming, laborious 
and expensive to address a large amount of research done on 
automatic image annotation. 

Text based image retrieval stores the text in the form of 
keywords together with the image. Some TBIR uses 
surrounding text of the image to search the keywords which 
are physically close to the image. Content based image 
retrieval system makes direct use of content of the image 
rather than relying on the human annotation of metadata with 
the keywords. At present the CBIR makes use of low level 
features like shapes, color and texture to retrieve desired 
image from database. To obtain efficient image retrieval, tools 
like pattern recognition and statistics are well used different 



implementation of CBIR makes use of different types of 
queries. 

Region based image retrieval is an extension of content 
based image retrieval techniques. Region based image 
retrieval system provides new query types to search for 
objects embedded in an arbitrary environment. This system 
automatically segments images into a variable number of 
regions and uses a segmentation algorithm to extract a set of 
features for each region. Context Based Image Retrieval is a 
comparatively new approach of image retrieval. Context is 
any information that can be used to characterize the situation 
of an entity. Context is the where, who, what and when of an 
object. 

Texture can define as the visual pattern that has properties 
of homogeneity not resulting from the presence of only a 
single color or intensity. Various techniques for texture 
analysis have been investigated in the field of computer vision 
and pattern recognition. The texture extraction techniques can 
be classified into two categories: They are statistical and 
structural. Statistical approaches use intensity distribution of 
image to extract statistical parameters representing texture of 
image. Commonly used statistical methods include Fourier 
power spectra, Co-occurrence matrices, Shift-invariant 
principal component analysis (SPCA), Tamura feature, Wold 
decomposition, Markov random field, Fractal model, and 
Multi-resolution filtering techniques such as Gabor and 
wavelet transform. 

II. LITERATURE SURVEY 

Nadia[l] developed a retrieving and distributing 
multimedia data becomes a frequent but still challenging task 
of retrieving data from large scale multimedia databases with 
satisfactory accuracy and performance rates. The advent of 
large scale multimedia databases has led to great challenges in 
content-based image retrieval (CBIR). In particular, it gives an 
overview of statistical methodologies and techniques 
employed for texture feature extraction using most popular 
spatial-frequency image transforms, namely discrete 
wavelets, Gabor wavelets, dual-tree complex wavelet and 
contour lets. It does not achieve high accuracy. 

Gauri [2] Content Based Image Retrieval (CBIR) the 
term Content based means that the search will analyze the 
actual contents (features) of the image. In the image two types 
of features are present, Low Level Features and High Level 
Features. It is difficult to extract high level features like 
emotions or different activities present in that image. But they 
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give relatively more information about objects and scenes in 
the images that are perceived by human beings. The low level 
features to be used depend upon the applications. It does not 
consider shape feature for improve accuracy level. 

Gode [3] described the content-based means that the 
search analyses the contents of the image other than the 
metadata. Metadata refers to keywords, tags or descriptions 
associated with the image. To apply the integration of the 
above combination, then cluster based on properties create the 
co-occurrence matrix. Co-occurrence matrix calculates the 
feature vector for texture. Canny algorithm is use for edge 
detection to calculate the feature vector for the shape. 
Invariant moments are then used to record the shape features. 
Optimal feature extraction is still investigated. 

Usha [4] developed retrieval of an image is a more 
effective and efficient for managing extensive image database. 
In this proposed system, content based image retrieval is 
accomplished by features. The extracted feature vector of the 
query image is compared with extracted feature vectors of the 
database images to obtain the similar images. The main 
objective this work is classification of image using SVM 
algorithm. Support vector machine also known as SVM and is 
a supervised machine learning method that examine the data 
and identify the patterns, used for classification. The 
advantage of this algorithm is to classify the input query 
object depends on feature vectors and training samples. 
Optimal classification methods are required for improve 
precision and recall rate. 

Srinivasa [5] enhanced shape of an object is a binary 
image representing the extent of the object. This system 
proposes a method to compute the exact values of the 
moments by mathematically integrating the Legendre 
polynomials over the corresponding intervals of the image 
pixels. Experimental results show that the values obtained 
match those calculated theoretically and the image 
reconstructed from these moments have lower error than that 
of the conventional methods for the same order. Support 
vector machine is only suitable for smaller data bases it does 
not suitable for large data base. 

Mahantesh [6] discussed the content-based image 
retrieval is the application of computer vision techniques to 
the image retrieval problem, specifically the search for 
specific digital images in large databases. The evaluation of 
the proposed approach is carried out using the standard 
precision and recall measures. It achieves high retrieving 
accuracy. 

Amit [7] revealed the technique which uses visual 
contents to search images to find a desired image from a 
collection of databases has wide applications. In this system 
present an algorithm for retrieving images with respect to a 
database consisting of engineering/computer-aided design 
(CAD) models. It does not suitable to investigate whether the 
proposed shape representation is useful in other application 
domains, such as protein search in molecular biology. 

Mahdi [8] developed novel technique for content- 
based image retrieval based on tree matching. Image objects 
and their relations are some of the important features to match 
similar images. This new algorithm segments the image into 
some specific regions and then extracts their color, size, 
position, shape and object’s relation. The proposed algorithm 



computes center of each segment and connects center of them 
to obtain the image graph. By obtaining minimum spanning 
tree and then tree matching, we compare the image that is 
being searched in the database against the sample image. The 
complicated texture image which may rarely cause some fails. 

III. DRAWBACKS OF EXISTING SYSTEM 

Content based retrieval is done with the 
consideration of the texture feature. Various techniques were 
adapted to retrieve the contents that are stored in the database 
in the accurate manner. Some of the limitations of the 
previous researches are given. In the previous research work 
only texture and color features are considered for calculating 
the similarity level between query image and the database 
stored images which might lead to less accurate retrieval of 
the images. The older version of classification methodologies 
are used for unifying the similar kind of images which might 
lead to the less accurate retrieval of contents. Tree matching 
process used in the existing work might also leads to the more 
time complexity. 

IV. METHODOLOGY 

The existing research methodology of this work is 
discussed in the detailed manner. In the existing system, 
content based retrieval is used for retrieve and gives the 
images for the users as per their requirements. Retrieving 
Images in the accurate manner might leads to the failure of 
contents downloads which needs to preserved with the 
concept. The K-means clustering and Fuzzy C-means 
clustering are used for segmenting the images. Here the 
texture and shape features are used for retrieval. 

4.1 K-MEANS CLUSTERING 

K-means clustering is a method of vector 
quantization, originally from signal processing, that is popular 
for cluster analysis in data mining. K-means clustering aims to 
partition n observations into k clusters in which each 
observation belongs to the cluster with the nearest mean, 
serving as a prototype of the cluster. The problem is 
computationally difficult (NP-hard) however, there are 
efficient heuristic algorithms that are commonly employed 
and converge quickly to a local optimum. These are usually 
similar to the expectation-maximization algorithm for 
mixtures of Gaussian distributions via an iterative refinement 
approach employed by both algorithms. Additionally, they 
both use cluster centers to model the data; however, k-means 
clustering tends to find clusters of comparable spatial extent, 
while the expectation-maximization mechanism allows 
clusters to have different shapes. 

4.2 FUZZY C-MEANS CLUSTERING 

Fuzzy C-means clustering (FCM) is an iterative 
algorithm that produces optimal partitions based on 
minimization of the following objective function, Fuzzy c- 
means (FCM) is a method of clustering which allows one 
piece of data to belong to two or more clusters. It is frequently 
used in pattern recognition. 
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4.3 PROPOSED METHODOLOGY 

The present research is considered as two phases. In 
the first phase, the pre-processing of an image is to be done 
for a better image retrieval. Then feature extraction on image 
retrieval is to be accomplished. In the pre-processing stage 
various filter has been used, among them the spatial noise 
filter gives better performance for efficient noise removal on 
image retrieval. Hence, spatial noise filter is applied in the 
research for pre-processing on image. In the segmentation is 
the process of partitioning an image into multiple images. 
Content based image retrieval is done efficiently by using the 
combination of texture and shape features. Here, Gustafson- 
kessel algorithm and fuzzy shell clustering are used for 
segmentation. Then the two algorithms are used to improve 
the retrieval accuracy of the images. To get a high retrieving 
accuracy and efficient result instead of k means and fuzzy c 
means here used Gustafson-kessel algorithm and fuzzy shell 
clustering for segmentation. The shape and texture features 
are extracted and combined together for the retrieval. Finally, 
the features of the query are compared to those of the images 
in the database in order to rank each image according to its 
distance to the query. The detailed description of the proposed 
methodology is discussed. 




4.1 Process Flow 

4.3.1 Preprocessing 

There are numerous tasks to be completed before 
performing image retrieval. An image must be scanned and 
converted into gray scale image for pre-processing. Pre- 
processing consists of a few types of sub process to clean the 
image and make it appropriate to carry the image retrieval 
process accurately. Noise can cost the efficiency of the image 
retrieval system. Noise may occur due the poor quality of the 
image or that accumulated whilst scanning, but whatever is 
the cause of its presence it should be removed before further 
processing. We have used spatial filtering for the removal of 
the noise from the image. Generally filters are used to filter 
unwanted things or object in a spatial domain or surface. In 
digital image processing, mostly the images are affected by 
various noises. The main objectives of the filters are to 
improve the quality of image by enhancing is to improve 
interoperability of the information present in the images for 
human visual. 

4.3.1. 1 Spatial Noise Filtering 

To transferring an image, sometimes transmission 
problems cause a signal to spike, resulting in one of the three 
point scalars transmitting a incorrect value. This type of 



transmission error is called “salt and pepper” noise due to the 
bright and dark spots that appear on the image as a result of 
the noise. The ratio of incorrectly transmitted points to the 
total number of points is referred to as the noise composition 
of the image. The goal of a noise removal filter is to take a 
corrupted image as input and produce an estimation of the 
original with no foreknowledge of the characteristics of the 
noise or the noise composition of the image. In images 
containing noise, there are two challenges. The first challenge 
is determining noisy points. The second challenge is to 
determine how to adjust these points. In the Vector Median 
Filter (VMF), a point in the signal is compared with the points 
surrounding it as defined by a filter mask. Each point in the 
mask filter is treated as a vector representing a point in a three 
dimensional space. Among these points, the summed vector 
distance from each point to every other point within the filter 
is computed. The point in the signal with the smallest vector 
distance amongst those points in the filter is the minimum 
vector median. The Spatial Median Filter is a new noise 
removal filter. Spatial Median Filter and the Vector Median 
Filter follow a similar algorithm and it will be shown that they 
have comparable results. To improve the quality of the results 
of the Spatial Median Filter, a new parameter will be 
introduced and experimental data is shown demonstrating the 
amount of improvement. The Spatial Median Filter is a 
uniform smoothing algorithm with the purpose of removing 
noise and fine points of image data while maintaining edges 
around larger shapes. 

4.3.1. 2 Segmentation 

Segmentation refers to the process of partitioning a 
digital image into multiple segments. Image segmentation is 
typically used to locate objects and boundaries (lines, curves, 
etc.) in images. The Gustafson-Kessel algorithm associates 
each cluster with both a point and a matrix, respectively 
representing the cluster centre and its covariance. Whereas the 
original fuzzy c-means make the implicit hypothesis that 
clusters are spherical, the Gustafson-Kessel algorithm is not 
subject to this constraint and can identify ellipsoidal clusters. 
The cluster centre is computed as a weighted mean of the data, 
the weights depending on the considered algorithm, as 
detailed in the following. This cluster parameter updating step 
is alternated with the update of the weighting coefficients until 
a convergence criterion is met. 

4.3.4 Gustafson-Kessel Algorithm 

The Gustafson-Kessel algorithm associates each cluster 
with both a point and a matrix, respectively representing the 
cluster centre and its covariance. The original fuzzy c-means 
make the implicit hypothesis that clusters are spherical, the 
Gustafson- Kessel algorithm is not subject to this constraint 
and can identify ellipsoidal clusters. The covariance matrix is 
defined as a fuzzy equivalent of classic covariance. A size 
constraint is imposed on the covariance matrix whose 
determinant must be 1. As a consequence, the Gustafson- 
Kessel algorithm can identify ellipsoidal clusters having 
approximately the same size. This cluster parameter updating 
step is alternated with the update of the weighting coefficients 
until a convergence criterion is met. 
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Algorithm: Gustafson-Kessel Algorithm 

Input: Query images 
Output: Segmented regions 
Step 1 : Start the process 

Step 2: Calculate the cluster centre and covariance matrix are 
computed using fuzzifier. 

Step 3: Initialize the cluster. 

Step 4: To calculate maximum iteration 

Step 5: Calculate the covariance matrices 

Step 6: Calculate the distance norms using equation 5.1. 

Step 7: Update the coefficients until convergence criteria 
meet. 

Step 8: Repeat the step 3 to step 6. 



V. SIMILARITY MEASUREMENT 

5.1 HAUSDROFF DISTANCE 

The Hausdroff distance is defined as function for directed 
hausdroff distance from A to B. This function discovers the 
point, which is farthest from any point and calculates the 
distance from its neighbourhood. 

ti (A, B) = max { min { d (a, b) } } 

a e .A b<=B — (5.1) 

5.3 ACCURACY 

Accuracy refers to the degree of conformity and 
correctness of something when compared to a true or absolute 
value. The accuracy is calculate by taking the ratio difference 
between the total number of images present in the database 
that are use for comparison to the total correct similar images 
that are retrieved as output. This performance evaluation is 
conducted in the coral database which has been taken as 
output value. The accuracy is calculated as follows: 

(True positive T True negative') 

.4ccw.roc> (True positive + True negative + 

False positive + False negative) ^ 2) 

5.4 PRECISION 

Precision refers to a state of strict exactness how 
consistently something is strictly exact. Precision value is 
determined based on the retrieval of information at true 
positive prediction, false positive. The precision of image 
retrieval is calculated based how much percentage the values 
are retrieved correctly among the total number of images that 
are retrieved as output. The precision is calculated as follows: 
Precision =True Positive/ (True Positive + False Positive) —(5.3) 

5.5 RECALL 

Recall value is determined based on the retrieval of 
information at true positive prediction, false negative. Recall 
in this context is also referred to as the True Positive Rate. In 
that process the fraction of relevant instances that are 
retrieved. Recall is ratio between the correction predictions of 
images over the set of images that are taken as input from the 
values that are retrieved as output. The recall is calculated as 
follows: 

Recall =True Positive / (True Positive + False Negative) —(5.4) 



6. EXPERIMENTATION & RESULTS 

The proposed methodology is experimental with coral 
data set. The images with different sizes are considered in this 
data set in the JPEG format. This simulation was conducted in 
the MATLAB simulation environment which will retrieve the 
images present in the database in terms of query image which 
has been submitted. The Content retrieval process is 
implemented efficiently using the MATLAB toolkit. Coral 
dataset is considered in this work. This dataset contains image 
features extracted from a Corel image collection. Four sets of 
features are available based on the color histogram, color 
histogram layout, color moments, and co-occurrence. There 
are 10000 images which contain 100 categories. Every 
category contains 100 images of size 192x128 or 128x192 in 
JPEG format. All images come from Corel Gallery Magic 20, 
0000 (8 cds).The first 5000 images form Corel-5K Dataset, 
and all of the 10000 images form Corel- 1 OK dataset. It only 
used for academic communication and cannot be used in 
commercial products. Corel-5K and Corel- 1 OK datasets are 
used in the algorithms of multi-texton histogram(MTH), 
micro- structure descriptor(MSD) and color difference 
histogram(CDH) , which have been accepted for publication 
in Pattern Recognition. The categories of images that are 
considered are given in the following figures 



Figure 4.2 Sample images from corel 1000 

The above figure 4.2 is the various sample images of 
the image database corel 1000. The images considered for 
experimentation are from these different classes of their image 
database. The proposed algorithm Gustafson-kessel based 
segmentation and the fuzzy shell based segmentation is 
implemented and the experimentation has been done. A set of 
training images including deer, lion and tigers are taken and 
the learning process is done by extracting different features 
which are then grouped together. For a testing part input 
image of Lion image is considered as input image for this 
experimentation for which proposed approach is implemented. 
The similarity calculating is done with the consideration of the 
different parameters based on which content based retrieval is 
done. The input image considered is given in the following 
figure 4.3. 




Figure 4.3 Testing image 

The above input image is processed in different level for 
efficient and accurate retrieval of contents that are stored in 
the database. The processes that are done over the test input 
image is listed and represented in the following figure 4.4. 
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Initially image would be converted into the gray scale 
image for the efficient segmentation. By converting into gray 
scale image accurate segmentation can be done over the given 
test input image. The grey scale image is represented as 
follows: 




Figure 4.5 Grey scale image 

After converting into grey scale image different 
segmentation mechanism would be applied over the input 
image extracting the required part alone for efficient 
construction. The segmented image which was segmented 
using fuzzy shell clustering algorithm is given as follows 




Figure 4.6 Fuzzy shell clustering 

The segmented image which was segmented using 
gauston kessel segmentation approach is given as follows: 



Figure 4.7 Gauston kessel 

Based on these segmented result, the comparison 
would be done with the database stored images based on 
which image retrieval would be done with the consideration of 
the various results part. The retrieval result which has been 
obtained after comparison is given as follows: 




Figure 4.8 Retrieval result 

6.2 PERFORMANCE EVALUATION 

The performance evaluation is conducted in terms 
different performance metrics which is compared in different 
levels. The performance measures that are used to compare 
the effectiveness of the proposed mechanism than the existing 
ap proaches are 



Category 


K-means 

clustering 


FCM 


Gustafson-Kessel 


Accuracy 


74.73% 


5.35% 


90.10% 



Table 4.1 Accuracy Comparison 
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Figure 4.9 Accuracy Comparison 

From the table 4.1 Accuracy values which were obtained 
in terms of different segmentation approaches in terms of 
different parameter values is compared. This retrieval of 
images in terms of different segmentation algorithm is 
compared which shows that the proposed approach is 
improved in its performance 10 % more than the existing 
approach. 

7. CONCLUSION 

The proposed system introduced a Gustafson-kessel 
algorithm and fuzzy shell clustering algorithm which is used 
for detect various shapes easily. The median filtering is 
applied in the proposed work for removing the noise, where 
the value of an output pixel is found by the median of the 
neighbourhood pixel. The shape such as Shape moment 
invariants and texture features such as Tamura and Haralick 
features are extracted from the segmented image. Finally 
Euclidean distance and Hausdroff distance are calculated for 
similarity measurements. According to that similarity value 
the images in the data bases are retrieved. Experimental prove 
that the proposed methodology provides better result than the 
existing work. 
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